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A challenge for programming language research is to design and implement multi-threaded low-level 
languages providing static guarantees for memory safety and freedom from data races. Towards 
this goal, we present a concurrent language employing safe region-based memory management and 
hierarchical locking of regions. Both regions and locks are treated uniformly, and the language 
supports ownership transfer, early deallocation of regions and early release of locks in a safe manner. 

1 Introduction 

Writing safe and robust code is a hard task; writing safe and robust multi-threaded low-level code is even 
harder. In this paper we present a minimal, low-level concurrent language with advanced region-based 
memory management and hierarchical lock-based synchronization primitives. 

Region-based memory management achieves efficiency by bulk allocation and deallocation of ob- 
jects in segments of memory called regions. Similar to other approaches, our regions are organized in 
a hierarchical manner such that each region is physically allocated within a single parent region and 
may contain multiple child regions. This hierarchical structure imposes an ownership relation as well as 
lifetime constraints over regions. Unlike other languages employing hierarchical regions, our language 
allows early subtree deallocation in the presence of region sharing between threads. In addition, each 
thread is obliged to release each region it owns by the end of its scope. 

Multi-threaded programs that interact through shared memory generate random execution interleav- 
ings. A data race occurs in a multi-threaded program when there exists an interleaving such that some 
thread accesses a memory location while some other thread attempts to write to it. So far, type systems 
and analyses that guarantee race freedom [6] have mainly focused on lexically-scoped constructs. The 
key idea in those systems is to statically track or infer the lockset held at each program point. In the 
language presented in this paper, implicit reentrant locks are used to protect regions from data races. Our 
locking primitives are non-lexically scoped. Locks also follow the hierarchical structure of regions so 
that each region is protected by its own lock as well as the locks of all its ancestors. 

Furthermore, our language allows regions and locks to be safely aliased, escape the lexical scope 
when passed to a new thread, or become logically separated from the remaining hierarchy. These features 
are invaluable for expressing numerous idioms of multi-threaded programming such as sharing, region 
ownership or lock ownership transfers, thread-local regions and region migration. 

2 Language Design 

We briefly outline the main design goals for our language, as well as some of the main design decisions 
that we made to serve these goals. 
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Low-level and concurrent. Our language must efficiently support systems programming. As such, it 
should cater for memory management and concurrency. It also needs to be low-level: it is not intended 
to be used by programmers but as a target language of higher-level systems programming languages. 

Static safety guarantees. We define safety in terms of memory safety and absence of data races. A 
static type system should guarantee that well-typed programs are safe, with minimal run-time overhead. 

Safe region-based memory management. Similarly to other languages for safe systems program- 
ming (e.g. Cyclone) our language employs region-based memory management, which achieves effi- 
ciency by bulk allocation and deallocation of objects in segments of memory (regions). Statically typed 
regions |[T3l [T4l guarantee the absence of dangling pointer dereferences, multiple release operations of 
the same memory area, and memory leaks. Traditional stack-based regions lfT3l are limiting as they 
cannot be deallocated early. Furthermore, the stack-based discipline fails to model region lifetimes in 
concurrent languages, where the lifetime of a shared region depends on the lifetime of the longest-lived 
thread accessing that region. In contrast, we want regions that can be deallocated early and that can 
safely be shared between concurrent threads. 

We opt for a hierarchical region [8] organization: each region is physically allocated within a sin- 
gle parent region and may contain multiple child regions. Early region deallocation in our multi-level 
hierarchy automatically deallocates the immediate subtree of a region without having to deallocate each 
region of the subtree recursively. The hierarchical region structure imposes the constraint that a child 
region is live only when its ancestors are live. In order to allow a function to access a region without 
having to pass all its ancestors explicitly, we allow ancestors to be abstracted (i.e., our language supports 
hierarchy abstraction) for the duration of the function call. To maintain the liveness invariant we require 
that the abstracted parents are live before and after the call. Regions whose parent information has been 
abstracted cannot be passed to a new thread as this may be unsound. 

Race freedom. To prevent data races we use lock-based mutual exclusion. Instead of having a separate 
mechanism for locks, we opt for a uniform treatment of locks and regions: locks are placed in the same 
hierarchy as regions and enjoy similar properties. Each region is protected by its own private lock and 
by the locks of its ancestors. The semantics of region locking is that the entire subtree of a region 
is atomically locked once the lock for that region has been acquired. Hierarchical locking can model 
complex synchronization strategies and lifts the burden of having to deal with explicit acquisition of 
multiple locks. Although deadlocks are possible, they can be avoided by acquiring a single lock for 
a group of regions rather than acquiring multiple locks for each region separately. Additionally, our 
language provides explicit locking primitives, which in turn allow a higher degree of concurrency than 
lexically-scoped locking, as some locks can be released early. 

Region polymorphism and aliasing. Our language supports region polymorphism: it is possible to 
pass regions as parameters to functions or concurrent threads. This enables region aliasing: one actual 
region could be passed in the place of two distinct formal region parameters. In the presence of mutual 
exclusion and early region deallocation, aliasing is dangerous. Our language allows safe region aliasing 
with minimal restrictions. The mechanism that we employ for this purpose also allows us to encode 
numerous useful idioms of concurrent programming, such as region migration, lock ownership transfers, 
region sharing, and thread-local regions. 
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3 Language Features through Examples 

Our regions are lexically-scoped first-class citizens; they are manipulated via explicit handles. For in- 
stance, a region handle can be used for releasing a region early, for allocating references and regions 
within it, or for locking it. Our language uses a type and effect system to guarantee that regions and their 
contents are properly used. The details will be made clear in Sections @] and [6] Here, we present the 
main features of our language through examples. We try to avoid technical issues as much as possible; 
however, some characteristics of the type and effect system are revealed in this section and their pres- 
ence is justified. Furthermore, to simplify the presentation in this section, we use abbreviations for a few 
language constructs that we expect the readers will find more intuitive. 

Example 1 (Simple Region Usage) This example shows a typical region use. New regions are allocated 
via the newrgn construct. This construct requires a handle to an existing region (heap in this case), in 
which the new region will be allocated, and introduces a type-level name (p) and a fresh handle (h) for 
the new region. The handle h is then used to allocate a new integer in region p ; a reference to this integer 
(z) is created. Finally, the region is deallocated before the end of its lexical scope. 

newrgn p , h at heap in // {p 1,1 >Ph} 

let z = new 10 at h in 

z := deref z + 5 ; 

free h ; // { } — empty effect, p is no longer alive 



The comments on the right-hand side of the example's code show the current effect. An effect is roughly 
a set of capabilities that are held at a given program point. Right after creation of region p , the entry 
p l l >PH is added to the effect; this means that a capability ("1, 1" — we will later explain what this 
means) is held for region p, which resides in the heap region (p#). Regions start their life as local to 
a thread and their contents can be directly accessed. For instance, a reference z can be created in p, 
dereferenced and assigned a new value, as long as the type system can verify that a proper capability for 
p is present in the current effect. Deallocation of p removes the capability from the effect; once that is 
done, the region's contents become inaccessible. 

Example 2 (Hierarchical Regions) In the previous example a trivial hierarchy was created by allocat- 
ing region p within the heap region. It is possible to construct richer region hierarchies. As in the 
previous example, the code below allocates a new region pi within the heap. Other regions can be then 
allocated within pi, e.g. p 2 ; this can done by passing the handle of pi to the region creation construct. 
Similarly, regions p 3 and p 4 can be allocated within region p 2 . 



newrgn pi , h at heap in // {p/'Vp//} pH 

newrgn p 2 ,h 2 at h x in // {p/'Vp/,, p 2 A >Pi} p\' 



1.0 



newrgn p 3 ,h 3 at h 2 in // {p{ >p H , p 2 ' >Pi, P 3 U l>p 2 } 

newrgn p A ,h 4 at h 2 in // {p x ' >p H , p 2 U >pi, p 3 M >p 2 , p\' t>p 2 } 



Pi 



P3 P4 



Our language allows regions to be allocated at any level of the hierarchy. For instance, it is possible to 
allocate more regions within region p\, in the lexical scope of region p 4 . 
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Example 3 (Bulk Region Deallocation) In the first example a single region was deallocated. That re- 
gion was a leaf node in the hierarchy; it contained no sub-regions. In the general case, when a region is 
deallocated, the entire subtree below that region is also deallocated. This is what happens if, in the code 
of the previous example, we deallocate region p 2 within the innermost scope; regions p 3 and p 4 are also 
deallocated. They are all removed from the current effect and thus are no longer accessible. 



newrgn p\,h at heap in // {p\' 1 >Ph} 

newrgnp 2 ,/i 2 at h\ in // {p{ >p H , p 2 ' >pi} 



P 1 / 



pI' 1 



newrgn p 3 ,/z 3 at h 2 in // {p\ ,l >p H , p 2 'Vpi, p 3 1 ' 1 >p 2 } / , \ 



newrgn p 4 ,h 4 at h 2 in // {p\' l >p H , Pj'Vpi, p\' l >p 2 , pi'Vp^ 



1.1 



freeh 2 ; // {p{' l >p H } P 3 U <AVi 

// p 2 , P3 a«<i P4 are rao longer alive 

Example 4 (Region Migration) A common multi-threaded programming idiom is to use thread-local 
data. At any time, only one thread will have access to such data and therefore no locking is required. 
A thread can transfer thread-local data to another thread but, doing so, it loses access to the data. This 
idiom is known as migration. Our language encodes thread-local data and data migration. As we have 
seen, newly created regions are considered thread-local; a capability for them is added to the current 
effect. We support data migration by allowing such capabilities to be transferred to other threads. 

The following example illustrates region migration. A server thread is defined, which executes an 
infinite loop. In every iteration, a new region is created and is initialized with client data. The contents 
of the region are then processed and finally transferred to a newly created (spawned) thread. 

def server = Ap#. Xheap. 
while (true) do 

newrgn p , h at heap in // {p l,l >pn} 

let z = wait_data[p](/z) in // region p is thread-local 
process(z); 

spawn output \p](h,z); // {} — empty effect, p migrates to output 

jjp cannot he accessed here 

The server thread accepts the heap region and its handle. Within the infinite loop, it allocates a new 
region p in the heap. Its handle h is passed to function wait_data, which is supposed to fill the region 
p with client data (z). Function process is then called and works on the data. Until this point, region p 
is thread-local and accessible to the server thread, so no explicit locking is required. Now, let us assume 
that we want the processed data to be output by a different thread, e.g. to avoid an unnecessary delay on 
the server thread. A new thread output is spawned and receives the region handle h and the reference z 
to the client data. The capability P 1,1 >Ph is removed from the effect of server and is added to the input 
effect of thread output. Therefore, region p has now become thread-local to thread output, which can 
access it directly, while it is no longer accessible to the server thread. 

Example 5 (Region Sharing) In the previous examples, capabilities for all regions were "1, 1" which, 
as we roughly explained, corresponds to thread-local. In general, a capability for a region consists of two 
natural numbers; the first denotes the region count, whereas the second denotes the lock count. When the 
region count is positive, the region is definitely alive. Similarly, when the lock count is positive, memory 
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accesses to this region's contents are guaranteed to be race free. Capabilities with counts other than 1 
can be used for sharing regions between threads. 

Multithreaded programs often share data for communication purposes. In this example, a server 
thread almost identical to that of the previous example is defined. The programmer's intention here, 
however, is to process the data and display it in parallel. Therefore, the output thread is spawned first 
and then the server thread starts processing the data. 

def server = Ap#. Xheap. 
while (true) do 

newrgn p , h at heap in // {p u [>p#} 

let z = wait_data[p](/z) in 

share h; unlock h; // {p 2 -°[>p#} 

spawn output [p](/j,z); // {p 1,0 >Pi/} — output consumes p l '°>pn 

while ([finished) do 

lock/j; // {p ld >p H } 

process(z); 

unlock h II {p 10 >Ph} 

Operator share increases the region count and operator unlock decreases the lock count. As a con- 
sequence, starting with capability p l l >p H , we end up with p 2 '°>p H . When output is spawned, it 
consumes "half" of this capability (p 1,0 >p#); the remaining "half" (p 10 i>p#) is still held by the server 
thread. Region p is now shared between the two threads; however, none of them can access its data 
directly, as this may lead to a data race. The lock and unlock operators have to be used for explicitly 
locking and unlocking the region, before safely accessing its contents. Processing is now performed 
iteratively; the server thread avoids locking the region for long periods of time, thus allowing the output 
thread to execute a similar loop and gain access to the region when needed. 

Example 6 (Hierarchical Locking) In the previous example, locking and unlocking was performed on 
a leaf region. In general, locking a region in the hierarchy has the effect of atomically locking its subre- 
gions as well. A region is accessible when it has been locked by the current thread or when at least one 
of its ancestors has been locked. 

Hierarchical locking can be useful when a set of locks needs to be acquired atomically. In this 
example, we assume that two hash tables (tbl\ and tbh) are used. An object with a given key must be 
removed from tbh, which resides in region pi, and must be inserted in tbl 2 , which resides in region p 2 . 
We can atomically acquire access to both regions pi and p 2 , by locking a common ancestor of theirs. 

lock h; II the handle of a common ancestor of p\ and p 2 

let obj = ha.sh_remove[p[] (tbl\, key) in 
hash_insert [p 2 ] (tbh , key, obj); 
unlock h 

Example 7 (Region Aliasing) An expressive language with regions will have to support region poly- 
morphism, which invariably leads to region aliasing. This must be handled with caution, as a naive 
approach may cause unsoundness. In the examples that follow, we discuss how region aliasing is used in 
our language as well as the restrictions that we impose to guarantee safety. 

Function swap accepts two integer references, residing in regions pi and p 2 , and swaps their contents. 
It assumes that both regions are already locked and remain locked when the function returns. 

def swap = Apj. Ap 2 . X(x : ref (pi,int), y : ref (p 2 , int))// p\ and p 2 must be both locked 
let z = deref x in // OK: p\ is locked 
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x := deref y; // OK: pi and p 2 are locked 

y := z II OK: p 2 is locked 

In order to instantiate p\ and p 2 with the same region p, we can create two lock capabilities by using 
the lock operator twice on p's handle h. Of course, the second use of lock will succeed immediately, 
as the region has already been locked by the same thread. 

// {P 2 -°»Ph} 

lock/i; lock/i; // {p 2,2 >Ph} 

swap[p][p](a,£); // each p parameter requires p l l >pji 

unlock h; unlock h // {p 2 >°>p H } 

Example 8 (Reentrant locks) Region aliasing introduces the need for reentrant locks. To see this, let 
us change the swapping function of the previous example, so that it receives two references in unlocked 
regions. For swapping their contents, it will have to acquire locks for the two regions (and release them, 
when they are no longer needed). 

def swap = Api. Ap 2 . X(h\ : rgn(pi), /j 2 :rgn(p 2 ). 

x : ref (pi, int), y : ref (p 2 , int))// p\ and p 2 are unlocked 

lock hi; 

let z = deref x in // OK: p\ is locked 

lock /j 2 ; 

x := deref y; // OK: p\ and p 2 are locked 

unlock hi; 

y := z; II OK: p 2 is locked 

unlock /i 2 // all locks can be released 

Suppose again that we are to instantiate pi and p 2 with the same region p. 
•■• // {P 2S) »Ph} 

sva.p\p]\p](h,h,a,b); // each p parameter requires p 10 l>p// 

We can easily see, however, that the run-time system cannot use binary locks; in that case, swap[p][p] 
would either come to a deadlock, waiting to obtain once more the lock that it has already acquired, or 
— worse — it might release the lock early (at unlock hi) and allow a data race to occur. To avoid 
unsoundness, we use reentrant locks: lock counts are important both for static typing and for the run- 
time system. A lock with a positive run-time count can immediately be acquired again, if it was held by 
the same thread. Moreover, a lock is released only when its run-time count becomes zero. 

Example 9 (Pure and Impure Capabilities) Unrestricted region aliasing leads to unsoundness. Con- 
sider function bad, which accepts two integer references (x and y) in regions p\ and p 2 , which are both 
locked. It lets pi migrate to a new thread and passes x as a parameter. It then assigns a value to y. 

def bad = Api- Ap 2 . X{x : ref (pi, int), y : ref (p 2 , int)). // pi and p 2 must be both locked 
spawn f [pi](x); // Pi migrates to f while locked 

y:=l If OK: p 2 is still locked — WRONG! 

A data race may occur if we call bad as follows; both threads have access to a, each holding a lock for 
P- 

swap[p][p](a , ,a); // each p parameter requires p l ' l >pn 
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Value 



Effect 



Function / 
Expression e 



Type 



v 



r 



r 



/ | c | rgn, | loc, 
Xx. e as T — 4 T | Ap . / 

x | c | / | (e e)' e [r] | new e at ee \ e := e loc, 
deref e | newrgn p,x at e in e | cap,j e | rgn ( 

& | X — 4T I Vp.T | ref(T,r) | rgn(r) 



| 7, r^ft 



Calling mode E, 

Capability op r/ 

Capability kind iff 

Capability k 

Region parent n 

Region r 



:= n.n | n,n 



:= r | _L | ? 
:= p | ; 



:= seq | par(y) 
:= I V- 

:= rg | Ik 



Figure 1: Syntax. 



The cause of the unsoundness is that, in this last call to swap[p][p], we allowed a single capability 
P 2 ' 2 >Ph to be divided in two distinct capabilities p l l >pu- More specifically, we divided the lock count 
in two and created two distinct lock capabilities, one of which escaped to a different thread through 
region migration. To resolve the unsoundness, we introduce the notion of pure (i.e., full) and impure (i.e., 
divided) capabilities. For instance, p 2 ' 2 t> p H is a pure capability; when we divide it we obtain two impure 
halves, which we denote as p l l >pn- Impure capabilities cannot be given to newly spawned threads 
when their lock count is positive. In contrast with pure capabilities, they represent inexact knowledge of 
a region's counts. 

4 Language Description 

The syntax of the language is illustrated in Figure [TJ The language core comprises of variables (x), 
constants (c), functions, and function application. Functions can be region polymorphic (Ap./) and 
region application is explicit (e[p]). Monomorphic functions (Xx.e) must be annotated with their type. 
The application of monomorphic functions is annotated with a calling mode {t,), which is seq for normal 
(sequential) application and par(y) for spawning a new threadQ Parallel application is annotated with 
the input effect of the new thread (y); this annotation can be automatically inferred by the type checker. 
The constructs for manipulating references are standard. A newly allocated memory cell is returned by 
new e\ at e2, where e\ is the value that will be placed in the cell and ^2 is a handle of the region in which 
the new cell will be allocated. Standard assignment and dereference operators complete the picture. 

The construct newrgn p,x at e\ in e2 allocates a new region p and binds x to the region handle. 
The new region resides in a parent region, whose handle is given in e\. The scope of p and x is ^2, 
which must consume the new region by the end of its execution. A region can be consumed either by 
deallocation or by transferring its ownership to another thread. At any given program point, each region 
is associated with a capability ( k). Capabilities consist of two natural numbers, the capability counts: 
the region count and lock count, which denote whether a region is live and locked respectively. When 
first allocated, a region starts with capability (1,1), meaning that it is live and locked, so that it can 
be exclusively accessed by the thread that allocated it. As we have seen, this is our equivalent of a 
thread-local region. 

By using the construct cap^ e, a thread can increment or decrement the capability counts of the 
region whose handle is specified in e. The capability operator 17 can be, e.g., rg+ (meaning that the 
region count is to be incremented) or Ik— (meaning that the lock count is to be decremented)! 2 ] When a 
region count reaches zero, the region may be physically deallocated and no subsequent operations can 

'in the examples of Section[3] we used more intuitive notation: we omitted seq and used the keyword spawn instead of par. 
2 The region manipulation operators used in Section[3]are simple abbreviations: share = cap rg+ , unlock = cap| k _, etc. 
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Configuration 
Threads 



T 



C 



:= 



:= S;T 



T,n:e 

S,i : (0,H,S) 
8,n\ i — ^ «2, n 3 



E 




Thread map 
Memory heap 



Store 



5 







:= 



:= 



new E at e£ | new v at £e 
deref E \ E := e \ v := E 



H 



:= 



HJ^v 



Figure 2: Configurations, store, threads and evaluation contexts. 



be performed on it. When a lock count reaches zero, the region is unlocked. As we explained, capability 
counts determine the validity of operations on regions and references. All memory-related operations 
require that the involved regions are live, i.e., the region count is greater than zero. Assignment and 
dereference can be performed only when the corresponding region is live and locked. 

A capability of the form (n 1,712) is called a pure capability, whereas a capability of the form (n[ ,^2) 
is called an impure capability. In both cases, it is implied that the current thread can decrement the 
region count n\ times and the lock count «2 times. Impure capabilities are obtained by splitting pure or 
other impure capabilities into several pieces, e.g., (3,2) = (2, 1) + (1, 1), in the same spirit as fractional 
capabilities (4). As we explained in Example 9 of Section |3l these pieces are useful for region aliasing, 
when the same region is to be passed to a function in the place of two distinct region parameters. An 
impure capability implies that our knowledge of the region and lock count is inexact. The use of such 
capabilities must be restricted; e.g., an impure capability with a non-zero lock count cannot be passed to 
another thread, as it is unsound to allow two threads to simultaneously hold the same lock. Capability 
splitting takes place automatically with function application. 

5 Operational Semantics 

We define a small-step operational semantics for our language, using two evaluation relations, at the level 
of threads and expressions (Figures [3] and [4] on the next page). The thread evaluation relation transforms 
configurations. A configuration C (see Figure [2]) consists of an abstract store S and a list of threads 
Each thread in T is of the form n : e, where n is a thread identifier and e is an expression. The store is a 
list of regions of the form 1 : (6,H,S), where 1 is a region identifier, 6 is a thread map, H is a memory 
heap and S is the list of subregions in the region hierarchy. The thread map associates thread identifiers 
with capability counts for region 1, whereas the memory heap represents the region's contents, mapping 
locations to values. 

A thread evaluation context E (Figure 12]) is defined as an expression with a hole, represented as □. 
The hole indicates the position where the next reduction step can take place. Our notion of evaluation 
context imposes a call-by-value evaluation strategy to our language. Subexpressions are evaluated in a 
left-to-right order. 

We assume that concurrent reduction events can be totally ordered. At each step, a random thread 
(n) is chosen from the thread list for evaluation (Figure [3]). It should be noted that the thread evaluation 
rules are the only non-deterministic rules in the operational semantics of our language; in the presence of 
more than one active threads, our semantics does not specify which one will be selected for evaluation. 
Threads that have completed their evaluation and have been reduced to unit values, represented as (), 
are removed from the active thread list (rule E-T). Rule E-S reduces some thread n via the expression 

3 The order of elements in comma-separated lists, e.g. in a store 5 or in a list of threads T, is not important; we consider all 
list permutations as equivalent. 
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e' = {{Xx.e as t) v) par(ri) e" = ((Axe as t) v) seq 
fresh n 5' = transfer(5, n, n , J\ ) 

S;T,n:E[e'} S';T,n:E[()],ri :e" 



(ESN) 



(E-S) (E-T) 



S;T,n:E[e] ^S';T,n:E[e'] S;T,n: () ~» S;T 

Figure 3: Thread evaluation relation C ~» C". 

(E-A; ; : ; : (E-RP) 



5;((Ax.eas t) v) 5eq ->„ S;e[v/jt] S; (Ap./)[r] ->„ S;/[r/p 

(5',fc)=newrgn(5',n,j) (E-NG) S' =updca.p(S,r},j,n) ^ ^ 



5;newrgn p.x at rgrij ine — »„ S';e[fc/p][rgn£/jc] 5; cap^ rgrij — 5'; () 

(S',£) = alloc(j,5,v) S' = update (5, lv,n) v = looWS.ln) 

J (E-NR) ' , (E-AS) — ' (E-D) 



5;new v at rgrijE ->„ 5';loc f S;loc^ := v ->•„ 5'; () S;deref loc f ->•„ 5;v 

Figure 4: Expression evaluation relation S;e — *•„ S';e'. 



evaluation relation. When a parallel function application redex is detected within the evaluation context 
of a thread, a new thread is created (rule ESN). The redex is replaced with a unit value in the cur- 
rently executed thread and a new thread is added to the thread list, with afresh thread identifier. The 
partial function transfer(S,«,«', y{) updates the thread maps of all regions specified in y lt transferring 
capabilities between threads n and n' . It is undefined when this transfer is not possible. 

The expression evaluation relation is defined in Figure 01 The rules for reducing function application 
(E-A ) and region application (E-RP) are standard. The remaining rules make use of five partial functions 
that manipulate the store. These functions are undefined when their constraints are not met. All of them 
require that some region is live. A region is live when the sum of all region counts in the thread map 
associated with that region is positive and all ancestors of the region are live as well. In addition to 
liveness, some of these functions require that some region is accessible to the currently executed thread. 
Region r is accessible to some thread n (and inaccessible to all other threads) when r is live and the 
thread map associated with r, or with some ancestor of r, maps fitoa positive lock count. 

• alloc (j ,S, v) is used in rule E-NR for creating a new reference. It allocates a new object in S. The 
object is placed in region j and is set to value v. Region j must be live. Upon success, the function 
returns a pair (S',£) containing the new store and a fresh location for the new object. 

• lookup (5, £,n) is used in rule E-D to look up the value of location £ in S. The region in which £ 
resides must be accessible to the currently executed thread n. Upon success, the function returns 
the value v stored at £. 

• update(5',£, v,n) is used in rule E-AS to assign the value v to location £ in 5. The region in which 
£ resides must be accessible to the currently executed thread n. Upon success, the function returns 
the new store S'. 

• newrgn(S,7i, j) is used in rule E-NG to create a new region in 5. The new region is allocated 
within j, which must be live. Its thread map is set to n i-> 1,1. Upon success, the function returns 
a pair (S',k) containing the new store and a fresh region name for the new region. 

• updcap(5, r], j,n) is used in rule E-C. This operation updates 5 by modifying the region or lock 
count of thread n for region j. Upon success, the function returns the new store 5'. When a lock 
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(T-F) 



R;M;A;F\- e 2 : Ti &(/;/') § h /" = ft®(/'e yi) 



(T-AP) 



W;M;A;rh Ax.e as t : T&(y;y) 



P;M;A;rh (ej e 2 )^ : T 2 &(y;y"') 



tf;M;A;Th e : ref (T,r)&(y;/) 
is_accessible(Y , r) 



(T-D) 



R\M;A\Th e\ : rgn(Y) & (y;/) redom(y') fi;AhT 
i?;M;A,p;r,;c : rgn(p) h e 2 : T&(y',p 1 ' 1 >r;/') p dom(y") 



(T-NG) 



i?;M;A;T h deref e : T&(y;/) 



tf;M;A;Th newrgn p,xat <*! in e 2 : T&(y;y") 



tf;M;A;rh ej : T&(y,/) 
i?;M;A;Th e 2 : rgn(r) & (/;/') re dom (y") 



(T-NR ) 



k' = [rj] (ic) y" = Jive(/,f lc '>^) 



(T-CP) 



W;M;A;rh new ei at <? 2 £ : ref(T, p) & (y;y") 



R;M;A;T\- cap ei : ()&(y;y") 



Figure 5: Selected typing rules. 



update is requested and the lock is held by another thread, the result is undefined. In this case, rule 
E-C cannot be applied and the operation will block, until the lock is available. 

The operational semantics may get stuck when a deadlock occurs. Our semantics will also get stuck 
when a thread attempts to access a memory location without having acquired an appropriate lock for this 
location. In this case, update (5, £,v,n) and lookup (S^ra) are undefined and it is impossible to perform a 
single step via rules E-AS or E-D. The same is true in several other situations (e.g. when referring to a 
non-existent region or location). Threads that may cause a data race will definitely get stuck. 

We follow a different approach from related work, e.g. the work of Grossman |[T0l . where a special 
kind of value junk v is often used as an intermediate step when assigning a value v to a location, before 
the real assignment takes place, and type safety guarantees that no junk values are ever used. As de- 
scribed above, we use a more direct approach by incorporating the locking mechanism in the operational 
semantics. Our progress lemma in Section |7]guarantees that, at any time, all threads can make progress 
and, therefore, a possible implementation does not need to check liveness or accessibility at run-time. 

6 Static Semantics 

In this section we discuss the most interesting parts of our type system. As we sketched in Section [3j to 
enforce our safety invariants, we use a type and effect system. Effects are used to statically track region 
capabilities. An effect (7) is a list of elements of the form r K > n, denoting that region r is associated with 
capability K and has parent %, which can be another region, _L, or ?. Regions whose parents are ± or ? 
are considered as roots in our region hierarchy. We assume that there is an initial (physical) root region 
corresponding to the entire heap, whose handle is available to the main program. The parent of the heap 
region is _L. More (logical) root regions can be created using hierarchy abstraction. The abstract parent 
of a region that is passed to a function is denoted by ?. 

The syntax of types in Figure Q] (on page [85]) is more or less standard. A collection of base types b is 
assumed; the syntax of values belonging to these types and operations upon such values are omitted from 
this paper. We assume the existence of a unit base type, which we denote by (). Region handle types 
rgn(r) and reference types ref (r,r) are associated with a type-level region r. Monomorphic function 
types carry an input and an output effect. A well-typed expression e has a type z under an input effect 7 
and results in an output effect /. The typing relation (see Figure[5]> is denoted by R;M; A; T h e : z & (7;/) 
and uses four typing contexts: a set of region literals (R), a mapping of locations to types (M), a set of 
region variables (A), and a mapping of term variables to types (r). The effects that appear in our typing 



P. Gerakios, N. Papaspyrou, and K. Sagonas 



89 



S, \- y= Y\®jr £, I" i = 72®7r /' = live(Y) consistent (y;y") 

S, =seq abs_par(y; 71) C dom( y") ^ = par(y"') 7l = /" A y> = 
(h.jJ) 

I hr" = 72e(yeyi) 

~ c „ ^6 {a,?} g = par(y)^V? g h y = ic, + k 2 ^Y=Yi®Y2 mc _ 

— — — — - — — — — (isi-cj 



£ I- y= 0©y <^h y, r'S-Tr = yi,r lc:i >a'©y2,r K ' 2 >a 

rg(K)=rg{K 1 )+rg(K 2 ) lk(k) = Uc()Ci) + JJc(ic 2 ) rg(Ki) > 

is_pure (K\) is.pure ( )C 2 ) is.pure (k\) ^> K = K\ E, ^ seq A -iis_pure ( fCj ) =>■ ii ( fc 2 ) = 

? h K = K\ + K 2 

Figure 6: Effect and capability splitting. 

(r K t>x)<EY rg(K)>0 ie{l,?} (A/) £ 7 rg(K)>0 isJive(Y,r') 

is Jive ( y, r) is Jive ( y, r) 

(r K >7l) e y ifc(f) > (r K >r') e y is_accessibie(y, r') 

is.accessible ( y, r) is.accessible ( y, r) 

Figure 7: Auxiliary predicates: region liveness and accessibility. 

rg{x) = ni if K = n\ ,n 2 V K = ni ,n 2 

IJc(fc) = ii2 if ft = »i,W2 V JC = n\,n2 

dom(j) = {r I (r^Ml) e y} 

liVe(y) = {r^C-ft I (r K: >a) £ yAisJive(y,r)} 

is_pure(K) = 3n\ . 3n 2 . K = "1 , w 2 

consistent(yi;y2) = (V(r lc >a) e yi- V(r K 't>n') e 73. a = a' A (is^pure(jc) <S> is_pure()f'))) 

A dom(y 2 ) C dom(yi) A Jive(yi) = j\ Aiive(y 2 ) = y 2 

abs.par(yi ; y 2 ) = j r (r K >r l ) e yi A (r K '> ?) € )5 } 

Figure 8: Auxiliary functions and predicates. 



relation must satisfy a liveness invariant: all regions that appear in the effect are live, i.e., their region 
counts and those of all their ancestors are positive. Thus, in order to check if a region r is live in the 
effect 7, we only need to check that r G dom(7). 

The typing rule for lambda abstraction (T-F) requires that the body e is well-typed with respect to the 
effects ascribed on its type. The typing rule for function application (T-AP) splits the output effect of e% 
(7") by subtracting the function's input effect (71). It then joins the remaining effect with the function's 
output effect (72). In the case of parallel application, rule T-AP also requires that the return type is 
unit. The splitting and joining of effects is controlled by the judgement % h y" = 72 © (7© 71), which is 
defined in Figure[6](the auxiliary functions and predicates are defined in Figures [7] and [8]). It enforces the 
following properties: 

• the liveness invariant for 7"; 

• the consistency of 7 and 7", i.e., regions cannot change parent and capabilities cannot switch from 
pure to impure or vice versa; the domain of 7" is a subset of the domain of 7; 

• for sequential application, all parent regions that become abstracted for the duration of the function 
call must be live after the function returns; 

• for parallel application, the thread output effect must be empty, the thread input effect must not 
contain impure capabilities with positive lock counts and hierarchy abstraction is disallowed. 

The typing rules for references are standard. In Figure [5] we only show the rules for dereference 
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(T-D) and reference allocation (T-NR). The former checks that region r is accessible. The latter only 
checks that the region r is live. The rule for creating new regions (T-NG) checks that e\ is a handle for 
some live region r 1 . Expression e2 is type checked in an extended typing context (i.e., p and x : rgn(p) 
are appended to A and Y respectively) and an extended input effect (i.e., a new effect is appended to the 
input effect such that the new region is live and accessible to this thread). The rule also checks that the 
type and the output effect of e2 do not contain any occurrence of region variable p . This implies that p 
must be consumed by the end of the scope of £2- The capability manipulation rule (T-CP) checks that e 
is a handle of a live region r. It then modifies the capability count of r as dictated by function [17], which 
increases or decreases the region or the lock count of its argument, according to the value of rj . The 
dynamic semantics ensures that an operational step is performed when the actual counts are consistent 
with the desired output effect. For instance, if the lock of region r is held by some other executing 
thread, the evaluation of cap| k+ must be suspended until the lock can be obtained. On the other hand, 
the evaluation of cap rg _ does not need to suspend but may not be able to physically deallocate a region, 
as it may be used by other threads. 



safety formulation is based on proving the preservation and progress lemmata. Informally, a program 
written in our language is safe when for each thread of execution an evaluation step can be performed or 
that thread is waiting for a lock {blocked). As discussed in Section[5l a thread may become stuck when it 
accesses a region that is not live or accessible (these are obviously the interesting cases in our concurrent 
setting; of course a thread may become stuck when it performs a non well-typed operation). Deadlocked 
threads are not considered to be stuck. 

Definition 1 (Thread Typing) Let T be a collection of threads. Let R;M; 8 be a global typing context, 
in which 8 is a mapping from thread identifiers to effects, used only for metatheoretic purposes. For each 
thread n:e in T, we take 8{n) to be the input effect that corresponds to the evaluation of expression e. 
The following rules define well-typed threads. 



Definition 2 (Store Consistency) A store S is consistent with respect to an effect mapping 8 when the 
following conditions are met: 

• Region consistency: the set of region names occurring in the co-domain of 8 is a subset of the set 
of region names in S. 

• Static-dynamic count consistency: for each region, the dynamic region and lock counts of some 
thread must be greater than or equal to the corresponding static counts of the same thread. 

• Mutual exclusion: only one thread may have a positive lock count in 8 for a particular region j. 
Additionally, only this thread is allowed to access or lock sub-regions of j . 

Definition 3 (Store Typing) A store S is well-typed with respect to R;M;8 (we denote this by 
R;M; 8 \- str S) when the following conditions are met: 

4 Full proofs and a full formalization of our language are given in the companion technical report |9|. 




R;M;8r- T T #;M;0;0 h e : () & (7;©) 



n ^ dom(S) 



/?;M;0h r 



R;M;8,n i->- y\- T T,n:e 
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• S is consistent with respect to 8, 

• the set of region names in S is equal to R, 

• the set of locations in M is equal to the set of locations in 5, and 

• for each location £, the stored value S(£) is closed and has type M(£) with empty effects, i.e., 
R;M;<d;<dh S(£) :M(£)&(0;0). 

Definition 4 (Configuration Typing) A configuration S;T is well-typed with respect to R;M;8 (we 
denote this by R;M; 8 \~c S; T) when the collection of threads T is well-typed with respect to R;M; 8 and 
the store S is well-typed with respect to R;M; 8. 

Definition 5 (Not stuck) A configuration S; T is not stuck when each thread in T can take one of the 
evaluation steps in Figure [3] (E-S, E-T or ESN) or it is waiting for a lock held by some other thread. 

Given these definitions, we can now present the main results of this paper. The progress and preser- 
vation lemmata are first formalized at the program level, i.e., for all concurrently executed threads. 

Lemma 1 (Progress — Program) Let S; T be a closed well-typed configuration with R;M; 8 \~c S; T. 
Then S; T is not stuck. 

Lemma 2 (Preservation — Program) Let S; T be a well-typed configuration with R;M; 8 h c S,T. If 
the operational semantics takes a step S,T S';T', then there exist R' D R, M' ^> M and 8' such that the 
resulting configuration is well-typed with R';M'; 8' \~c S';T'. 

An expression-level version for each of these two lemmata is required, in order to prove the above. 
At the expression level, progress and preservation are defined as follows. 

Lemma 3 (Progress — Expression) Let S be a well-typed store with R;M;8,n i-> y \- str S and let e be 
a closed well-typed redex with /?;M;0;0 h e : T& (y,/). Then exactly one of the following is true: 

• e is of the form cap| k+ rgnj and j is a live but inaccessible region to thread n, or 

• e is of the form (Xx.e\ as x v) par ^ or 

• there exist S' and e' such that S;e — > n S';e'. 

Lemma 4 (Preservation — Expression) Let e be a well-typed expression with R;M; 0; h e : X & (7;/') 
and let S be a well-typed store with R;M;8,n h-» 7 h str 5. If the operational semantics takes a step 
S;e — >„ S';e', then there exist R' D R, M' D M and / such that the resulting expression and the resulting 
store are well-typed with 7?';M';0;0 h e' : T&(/;/') and /?';M';5[n H> /] h rtr S'. 

The fype *a/efy theorem is a direct consequence of Lemmata Q] and [2 Let function main be the 
initial program, let i H be global heap region and let the initial typing contexts R and 8q and the initial 
program configuration So; To be defined by the following singleton lists: Ro = 80 = {1 h-> >-L}, 
00 = {1 1,0}, S () = {i H : (0,,,0,0)}, and T = {1 : (maia[l H ] rgn 1/; ) seq }. 

Theorem 1 (Type Safety) If the initial configuration So; 7b is well-typed with 7?o; 0; 5o l~c So; To and the 
operational semantics takes any number of steps So; 7b W S„;7^, then the resulting configuration S„;r„ 
is not stuck. 

The empty (except for Rq that contains only i#) contexts that are used when typechecking the initial 
configuration So; 7b guarantee that all functions in the program are closed and that no explicit region 
values (rgn ( ) or location values (Ioq) are used in the source of the original program. 
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8 Related Work 

The first statically checked stack-based region system was developed by Tofte and Talpin lfT3l . Since 
then, several memory-safe systems that enabled early region deallocation for a sequential language were 
proposed [TJ [121 Q31 13. Cyclone ifTTI and RC [8] were the first imperative languages to allow safe 
region-based management with explicit constructs. Both allowed early region deallocation and RC also 
introduced the notion of multi-level region hierarchies. RC programs may throw region-related excep- 
tions, whereas our approach is purely static. Both Cyclone and RC make no claims of memory safety 
or race freedom for concurrent programs. Grossman proposed a type system for safe multi-threading 
in Cyclone iPTOl . Race freedom is guaranteed by statically tracking locksets within lexically-scoped 
synchronization constructs. Grossman's proposal allows for fine-grained locking, but only deals with 
stack-based regions and does not enable early release of regions and locks. In contrast, we support 
hierarchical locking, as opposed to just primitive locking, and bulk region deallocation. 

Statically checked region systems have also been proposed (3] [Tvl [16) for real-time Java to rule 
out dynamic checks imposed by the language specification. Boyapati et al. introduce hierarchical 
regions in ownership types but the approach suffers from the same disadvantages as Grossman's work. 
Additionally, their type system only allows sub-regions for shared regions, whereas we do not have this 
limitation. Boyapati also proposed an ownership-based type system that prevents deadlocks and data 
races 1 2 ] ; in contrast to his system, we support locking of arbitrary nodes in the region hierarchy. Static 
region hierarchies (depth- wise) have been used by Zhao lfl7l . Their main advantage is that programs 
require fewer annotations compared to programs with explicit region constructs. In the same track, 
Zhao et al. [ 16 ] proposed implicit ownership annotations for regions. Thus, classes that have no explicit 
owner can be allocated in any static region. This is a form of existential ownership. In contrast, we 
allow a region to completely abstract its owner/ancestor information by using the hierarchy abstraction 
mechanism. None of the above approaches allow full ownership abstraction for region subtrees. 

Cunningham et al. [5] proposed a universe type system to guarantee race freedom in a calculus of ob- 
jects. Similarly to our system, object hierachies can be atomically locked at any level. Unlike our system, 
they do not support early lock releases and lock ownership transfers between threads. Consequently, their 
system cannot encode two important aspects of multi-threaded programming: thread-locality and data 
migration. Finally, our system provides explicit memory management and supports separate compilation. 

The main limitation of our work is that we require explicit annotations regarding ownership and 
region capabilities. Moreover, our locking system offers coarser-grained locking than most other related 
works. The use of hierarchical locking avoids some, though not all, deadlocks. 

9 Concluding Remarks 

In this paper, we have presented a concurrent language emloying region-based memory management 
and locking primitives. Regions and locks are organized in a common hierarchy and treated uniformly. 
Our language allows atomic deallocation and locking of entire subtrees at any level of the hierarchy; 
it also allows region and lock capabilities to be transferred between threads, encoding useful idioms of 
concurrent programming such as thread-local data and data migration. The type system guarantees the 
absence of memory access violations and data races in the presence of region aliasing. 

We are currently integrating our system in Cyclone. In the future, we are planning to extend our type 
system to achieve an exact correspondence between static and dynamic capability counts, and provide 
deadlock freedom guarantees. 
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